Discourse-aware Statistical Machine Translation as a Context-sensitive Spell Checker

نویسندگان

  • Behzad Mirzababaei
  • Heshaam Faili
  • Nava Ehsan
چکیده

Real-word errors or context sensitive spelling errors, are misspelled words that have been wrongly converted into another word of vocabulary. One way to detect and correct real-word errors is using Statistical Machine Translation (SMT), which translates a text containing some real-word errors into a correct text of the same language. In this paper, we improve the results of mentioned SMT system by employing some discourseaware features into a log-linear reranking method. Our experiments on a real-world test data in Persian show an improvement of about 9.5% and 8.5% in the recall of detection and correction respectively. Other experiments on standard English test sets also show considerable improvement of real-word checking results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارائه یک رتبه‌بند برای خطایاب معنایی با استفاده از ویژگی‌های حساس به متن

Nowadays, a large volume of documents is generated daily. These documents generated by different persons, thus, the documents contain spelling errors. These spelling errors cause quality of the documents are decrease. Therefore, existence of automatic writing assistance tools such as spell checker/corrector can help to improve their quality. Context-sensitive are misspelled words that have been...

متن کامل

A Log-Linear Block Transliteration Model based on Bi-Stream HMMs

We propose a novel HMM-based framework to accurately transliterate unseen named entities. The framework leverages features in letteralignment and letter n-gram pairs learned from available bilingual dictionaries. Letter-classes, such as vowels/non-vowels, are integrated to further improve transliteration accuracy. The proposed transliteration system is applied to out-of-vocabulary named-entitie...

متن کامل

Building a Real Word Spell Checker based on Power Links

A context-based spelling error is a spelling or typing error that turns an intended word into another word of language. Most of the methods that tried to solve this problem were depended on the confusion sets. Confusion set are collection of words where each word in the confusion set is ambiguous with the other words in the same set. the machine learning and statistical methods depend on predef...

متن کامل

Improving Finite-State Spell-Checker Suggestions with Part of Speech N-Grams

We demonstrate a finite-state implementation of context-aware spell checking utilizing an N-gram based part of speech (POS) tagger to rerank the suggestions from a simple edit-distance based spell-checker. We demonstrate the benefits of context-aware spellchecking for English and Finnish and introduce modifications that are necessary to make traditional N-gram models work for morphologically mo...

متن کامل

Statistical Machine Translation as a Grammar Checker for Persian Language

Existence of automatic writing assistance tools such as spell and grammar checker/corrector can help in increasing electronic texts with higher quality by removing noises and cleaning the sentences. Different kinds of errors in a text can be categorized into spelling, grammatical and real-word errors. In this article, the concepts of an automatic grammar checker for Persian (Farsi) language, is...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013